x <- rnorm(10000)
hist(x, 50)Quarto Notebook Example
The source code for this notebook is in this GitHub folder.
Markup Languages
Markdown, \(\LaTeX\) and HTML are markup languages. Notebooks generally (including Jupyter and others) use Markdown so you can write explanations in a more readable format. With Quarto, you can also include \(\LaTeX\) in the Markdown and then render to HTML or PDF.
Markdown, R Markdown, Quarto
Markdown is very easy to learn; find a cheat sheet like this one for the basic syntax like
- lists
- bold, italics (also with underscores),
code,strikethrough - Images:
You can render notebooks (with the “Render” button) to a variety of formats, like PDF, HTML, slides or plain Markdown. The header here is set up to render to a standalone HTML file.
R Markdown adds some more functionality to plain Markdown, like code chunks
and inline R code: sin(1)=0.841471. In RStudio, you can create a new code chunk with Ctrl-Shift-I on a blank line.
Quarto is a newer version of R Markdown made by Posit (formerly named RStudio). Older notebook files may have an “.rmd” extension for R Markdown, while newer “.qmd” files are Quarto.
Quarto adds some fancier things like tabset panels:
x <- rnorm(10000)
hist(x, 50, main = "Normal")x <- rgamma(10000, shape = 2)
hist(x, 50, main = "Gamma")LaTeX with MathJax
MathJax is a Javascript engine that renders \(\LaTeX\) in R Markdown. It doesn’t do everything that \(\LaTeX\) does, but it does enough for most purposes. Find some cheat sheets like this one or this one to learn how to do basics.
You can include inline math with a single $, like \(e^{\pi i}\), or you can use $$ for “display” math:
\[ \frac{d}{dx}\int_{a}^{x}f(t)\,dt=f(x) \]
Show multiple lines of work with the align environment.
\[ \begin{align*} \Gamma(n+1) & =\int_0^\infty x^{n}e^{-x}\,dx \\ & =nx^{n-1}e^{-x}\Bigg|_0^\infty+n\int_0^\infty x^{n-1}e^{-x}\,dx \\ & =n\cdot\Gamma(n) \end{align*} \]
Use pmatrix to create matrices with parentheses; there are others like bmatrix for square bracket [] matrices. For example, if \(f_n\) is the \(n^{\textsf{th}}\) Fibonacci number, note that
\[ \begin{pmatrix} 1 & 1 \\ 1 & 0 \end{pmatrix} \begin{pmatrix} f_{n} \\ f_{n-1} \end{pmatrix} =\begin{pmatrix} f_{n+1} \\ f_{n} \end{pmatrix}. \]
Let \(\phi_{\pm}=\tfrac12(1\pm\sqrt5)\) and diagonalize that matrix to show
\[ \begin{align*} \begin{pmatrix} f_{n+1} \\ f_{n} \end{pmatrix} & =\begin{pmatrix} 1 & 1 \\ 1 & 0 \end{pmatrix}^n \begin{pmatrix} 1 \\ 1 \end{pmatrix} \\ & =\frac{1}{\sqrt{5}} \begin{pmatrix} 1 & -\phi_- \\ -1 & \phi_+ \end{pmatrix} \begin{pmatrix} \phi_+ & 0 \\ 0 & \phi_- \end{pmatrix}^n \begin{pmatrix} \phi_+ & \phi_- \\ 1 & 1 \end{pmatrix} \begin{pmatrix} 1 \\ 1 \end{pmatrix}. \end{align*} \]
Macros
You can also use some \(\TeX\) macros like \def\eps{\varepsilon} in case you want a shortcut for \varepsilon=\(\varepsilon\) but don’t want to overwrite \epsilon=\(\epsilon\). However, \def commands like that can be a little odd in a notebook because that code doesn’t render to anything: \(\def\eps{\varepsilon}\).
It’s often better to use javascript, like the file “latex.js” in this directory, which is included in the header (see my notes and the MathJax documentation). I used it to define \R=\(\R\) to be \mathbb{R}.
R and the tidyverse
If you’re interested in working with data in R, it’s a good idea to read Hadley Wickham’s R for Data Science. The metapackage tidyverse includes most of the packages described there.
library(tidyverse)Two of the main packages are
ggplot2, based on the book The Grammar of Graphics, anddplyr, “a grammar for data manipulation”.
To get practice, it can help to know that R loads some dataframes automatically like mtcars from the old datasets package.
head(mtcars) mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Pipes
You should get used to the pipe operator |>. It’s two characters, a vertical line | and greater than >, but the Fira Code font combines them into something like \(\LaTeX\)’s \triangleright (\(\triangleright\)). There’s also an operator %>% you can use in the same way as |>; it’s just the older name for the same function.
The pipe operator is a simple syntax change: x |> f(y) is equivalent to f(x,y). It allows you to chain together a bunch of function compositions in a way that’s more readable.
For example, in the next code chunk, starting from the mtcars dataframe, you
- select only the columns
mpganddisp, - add in the column
nonsenseby adding those two columns, and - filter down to just the first 6 rows.
mtcars |>
select(mpg, disp) |>
mutate(nonsense = mpg + disp) |>
head() mpg disp nonsense
Mazda RX4 21.0 160 181.0
Mazda RX4 Wag 21.0 160 181.0
Datsun 710 22.8 108 130.8
Hornet 4 Drive 21.4 258 279.4
Hornet Sportabout 18.7 360 378.7
Valiant 18.1 225 243.1
Without |>, you have to read the function composition from the inside:
head(
mutate(
select(mtcars, mpg, disp),
nonsense = mpg + disp
)
) mpg disp nonsense
Mazda RX4 21.0 160 181.0
Mazda RX4 Wag 21.0 160 181.0
Datsun 710 22.8 108 130.8
Hornet 4 Drive 21.4 258 279.4
Hornet Sportabout 18.7 360 378.7
Valiant 18.1 225 243.1
The function |> is actually from the magrittr package, but it’s most often used to chain together functions from dplyr. Both of those where loaded with the library(tidyverse) command.
Plots
The next code chunk shows a simple scatter plot using the ggplot2 package, included in the tidyverse. It just consists of
- an aesthetic mapping, specifying that the
mpgcolumn goes to the x-axis anddispgoes to the y-axis, and - a geom, specifying that records will be displayed as points.
mtcars |>
ggplot(aes(x = mpg, y = disp)) +
geom_point()You can add color with a color aesthetic mapping. In the geom, I also specify that all points have a bigger size and some transparency.
mtcars |>
ggplot(aes(mpg, disp, color = factor(cyl))) +
geom_point(size = 3, alpha = 0.6)Making a plot is generally an iterative process. Maybe this one is good enough.
mtcars |>
mutate(cyl = factor(cyl)) |>
ggplot(aes(mpg, disp, color = cyl, group = cyl)) +
geom_point(size = 3, alpha = 0.6) +
geom_smooth(method = 'lm', formula = y~x, se = FALSE) +
theme_bw() +
theme(legend.position = "inside", legend.position.inside = c(0.85, 0.7)) +
ggtitle("disp vs mpg, grouped by cyl",
subtitle = "with linear regression lines")You can also make plots more interactive with packages like plotly.
library(plotly)That has a function ggplotly() that translates static ggplot objects to plotly plots.
p <-
mtcars |>
mutate(cyl = factor(cyl)) |>
ggplot(aes(mpg, disp, color = cyl, group = cyl)) +
geom_point(size = 3, alpha = 0.6) +
geom_smooth(method = 'lm', formula = y~x, se = FALSE) +
theme_bw() +
theme(legend.position = "inside", legend.position.inside = c(0.85, 0.7)) +
ggtitle("disp vs mpg, grouped by cyl",
subtitle = "with linear regression lines")
ggplotly(p)As you may notice, ggplotly() doesn’t translate things 100%: the legend above is outside the grid and there’s no subtitle. It generally does pretty well, and you can use the internet to fix the rest.